155 research outputs found

    Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation

    Full text link
    Large Language Models (LLMs) have made progress in various real-world tasks, which stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are mainly supervised signal-based which depends on static datasets and cannot evaluate the ability of LLMs in dynamic real-world scenarios where deep interaction widely exists. Other LLM evaluation methods are human-based which are costly and time-consuming and are incapable of large-scale evaluation of LLMs. To address the issues above, we propose a novel Deep Interaction-based LLM-evaluation framework. In our proposed framework, LLMs' performances in real-world domains can be evaluated from their deep interaction with other LLMs in elaborately designed evaluation tasks. Furthermore, our proposed framework is a general evaluation method that can be applied to a host of real-world tasks such as machine translation and code generation. We demonstrate the effectiveness of our proposed method through extensive experiments on four elaborately designed evaluation tasks

    Combating tracking drift : developing robust object tracking methods

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Visual object tracking plays an important role in many computer vision applications, such as video surveillance, unmanned aerial vehicle image processing, human computer interaction and automatic control. This research aims to develop robust object tracking methods, which are capable of tracking general object without the prior knowledge of the target. Tracker drift is one of the most challenging issues in object tracking due to target deformations, illumination variations, abrupt motions, occlusions and background clutters. This thesis focuses on the tracking drift problem, and adopts three main solutions. These include: designing an efficient target shape feature extraction method, comparing target features with metric learning and using the ensemble tracking method to tackle the tracking drift during tracker online update

    Deep Cooking: Predicting Relative Food Ingredient Amounts from Images

    Full text link
    In this paper, we study the novel problem of not only predicting ingredients from a food image, but also predicting the relative amounts of the detected ingredients. We propose two prediction-based models using deep learning that output sparse and dense predictions, coupled with important semi-automatic multi-database integrative data pre-processing, to solve the problem. Experiments on a dataset of recipes collected from the Internet show the models generate encouraging experimental results

    Algorithmic subsampling under multiway clustering

    Full text link
    This paper proposes a novel method of algorithmic subsampling (data sketching) for multiway cluster dependent data. We establish a new uniform weak law of large numbers and a new central limit theorem for the multiway algorithmic subsample means. Consequently, we discover an additional advantage of the algorithmic subsampling that it allows for robustness against potential degeneracy, and even non-Gaussian degeneracy, of the asymptotic distribution under multiway clustering. Simulation studies support this novel result, and demonstrate that inference with the algorithmic subsampling entails more accuracy than that without the algorithmic subsampling. Applying these basic asymptotic theories, we derive the consistency and the asymptotic normality for the multiway algorithmic subsampling generalized method of moments estimator and for the multiway algorithmic subsampling M-estimator. We illustrate an application to scanner data

    Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing

    Full text link
    Despite increasing interest in the automatic detection of media frames in NLP, the problem is typically simplified as single-label classification and adopts a topic-like view on frames, evading modelling the broader document-level narrative. In this work, we revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives, including conflict and its resolution, and integrate it with the narrative framing of key entities in the story as heroes, victims or villains. We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions, and present an annotated data set of English news articles, and a case study on the framing of climate change in articles from news outlets across the political spectrum. Finally, we explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches, and present a novel retrieval-based method which is both effective and transparent in its predictions. We conclude with a discussion of opportunities and challenges for future work on document-level models of narrative framing.Comment: To appear in ACL 2023 (main conference

    Models of preconception care implementation in selected countries.

    Get PDF
    Globally, maternal and child health faces diverse challenges depending on the status of the development of the country. Some countries have introduced or explored preconception care for various reasons. Falling birth rates and increasing knowledge about risk factors for adverse pregnancy outcomes led to the introduction of preconception care in Hong Kong in 1998, and South Korea in 2004. In Hong Kong, comprehensive preconception care including laboratory tests are provided to over 4000 women each year at a cost of 75perperson.InKorea,about6075 per person. In Korea, about 60% of the women served have known medical risk history, and the challenge is to expand the program capacity to all women who plan pregnancy, and conducting social marketing. Belgium has established an ad hoc-committee to develop a comprehensive social marketing and professional training strategy for pilot testing preconception care models in the French speaking part of Belgium, an area that represents 5 million people and 50,000 births per year using prenatal care and pediatric clinics, gynecological departments, and the genetic centers. In China, Guangxi province piloted preconceptional HIV testing and counseling among couples who sought the then mandatory premarital medical examination as a component of the three-pronged approach to reduce mother to child transmission of HIV. HIV testing rates among couples increased from 38% to 62% over one year period. In October 2003, China changed the legal requirement of premarital medical examination from mandatory to "voluntary." This change was interpreted by most women that the premarital health examination was "unnecessary" and overall premarital health examination rates dropped. Social marketing efforts piloted in 2004 indicated that 95% of women were willing to pay up to RMB 100 (US12) for preconception health care services. These case studies illustrate programmatic feasibility of preconception care services to address maternal and child health and other public health challenges in developed and emerging economies

    ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing

    Full text link
    Sketch-and-extrude is a common and intuitive modeling process in computer aided design. This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude. We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. Behind ExtrudeNet are two new technical components: 1) an effective representation for sketch and extrude, which can model extrusion with freeform sketches and conventional cylinder and box primitives as well; and 2) a numerical method for computing the signed distance field which is used in the network learning. This is the first attempt that uses machine learning to reverse engineer the sketch-and-extrude modeling process of a shape in an unsupervised fashion. ExtrudeNet not only outputs a compact, editable and interpretable representation of the shape that can be seamlessly integrated into modern CAD software, but also aligns with the standard CAD modeling process facilitating various editing applications, which distinguishes our work from existing shape parsing research. Code is released at https://github.com/kimren227/ExtrudeNet.Comment: Accepted to ECCV 202
    corecore